Document image decoding in the UC Berkeley Digital Library
نویسنده
چکیده
The UC Berkeley Environmental Digital Library Project is one of six university-led projects that were initiated in the fall of 1994 as part of a four-year digital library initiative sponsored by the NSF, NASA and ARPA. The Berkeley project is particularly interesting from a document image analysis perspective because its testbed collection consists almost entirely of scanned materials. As a result, the Berkeley project is making extensive use of document recognition and other image analysis technology to provide content-based access to the collection. The Document Image Decoding (DID) group at Xerox PARC is a member of the Berkeley team and is investigating the application of DID techniques to providing high-quality (accurate and properly structured) transcriptions of scanned documents in the collection. This paper briefly describes the Berkeley project, discusses some of its recognition requirements and presents examples of online structured documents created using DID technology.
منابع مشابه
Re-Inventing Scholarly Information Dissemination and Use
Our practice of disseminating, accessing and using information, especially scholarly information, is still largely informed by the nature of pre-electronic media. For example, journals still exist in their traditional forms partly because of the value of the peer review process, which thus far has not yielded to decentralized, distributed and timely alternatives. Similarly, information access i...
متن کاملLearning Document Image Features With SqueezeNet Convolutional Neural Network
The classification of various document images is considered an important step towards building a modern digital library or office automation system. Convolutional Neural Network (CNN) classifiers trained with backpropagation are considered to be the current state of the art model for this task. However, there are two major drawbacks for these classifiers: the huge computational power demand for...
متن کاملA Novel Patch-Based Digital Signature
In this paper a new patch-based digital signature (DS) is proposed. The proposed approach similar to steganography methods hides the secure message in a host image. However, it uses a patch-based key to encode/decode the data like cryptography approaches. Both the host image and key patches are randomly initialized. The proposed approach consists of encoding and decoding algorithms. The encodin...
متن کاملDocument Image Dewarping Based on Text Line Detection and Surface Modeling (RESEARCH NOTE)
Document images produced by scanner or digital camera, usually suffer from geometric and photometric distortions. Both of them deteriorate the performance of OCR systems. In this paper, we present a novel method to compensate for undesirable geometric distortions aiming to improve OCR results. Our methodology is based on finding text lines by dynamic local connectivity map and then applying a l...
متن کاملOptimum decoder for multiplicative spread spectrum image watermarking with Laplacian modeling
This paper investigates the multiplicative spread spectrum watermarking method for the image. The information bit is spreaded into middle-frequency Discrete Cosine Transform (DCT) coefficients of each block of an image using a generated pseudo-random sequence. Unlike the conventional signal modeling, we suppose that both signal and noise are distributed with Laplacian distribution, because the ...
متن کامل